Lexicon Generation by Extraction of Context Patterns
نویسندگان
چکیده
Semantic browser technologies such as Magpie require the construction of lexicons to support the identification of terms in Web pages which are linked to a user’s chosen ontology. We frame the generation of such lexicons from ontologies as a problem of finding synonyms and hyponyms. Synonym finding using the hypothesis of semantic substitutability relies upon the discovery of patterns in which the target word occurs. Information extraction has the potential to find a range of patterns in text. We present a methodology for finding synonyms for inclusion in lexicons in this way and preliminary tests of the method using standard tools.
منابع مشابه
یک چارچوب نیمهنظارتی مبتنی بر لغتنامه وفقی خودساخت جهت تحلیل نظرات فارسی
With the appearance of Web 2.0 and 3.0, users’ contribution to WWW has created a huge amount of valuable expressed opinions. Considering the difficulty or impossibility of manually analyzing such big data, sentiment analysis, as a branch of natural language processing, has been highly considered. Despite the other (popular) languages, a limited number of research studies have been conducted in ...
متن کاملLexical Acquisition for Information Extraction from Arabic Text Documents
The objective of this work is to design a lexicon suitable for information extraction from Arabic texts, and to acquire this lexicon automatically for specific domain from set of electronic documents. To achieve this goal we have to find a way to represent the document as well as the domain knowledge, extract the document and domain knowledge, then design a lexicon suitable for IE tasks, and fi...
متن کاملData Extraction using Content-Based Handles
In this paper, we present an approach and a visual tool, called HWrap (Handle Based Wrapper), for creating web wrappers to extract data records from web pages. In our approach, we mainly rely on the visible page content to identify data regions on a web page. In our extraction algorithm, we inspired by the way a human user scans the page content for specific data. In particular, we use text fea...
متن کاملImproving Corpus Comparability for Bilingual Lexicon Extraction from Comparable Corpora
Previous work on bilingual lexicon extraction from comparable corpora aimed at finding a good representation for the usage patterns of source and target words and at comparing these patterns efficiently. In this paper, we try to work it out in another way: improving the quality of the comparable corpus from which the bilingual lexicon has to be extracted. To do so, we propose a measure of compa...
متن کاملFirst Language Activation during Second Language Lexical Processing in a Sentential Context
Lexicalization-patterns, the way words are mapped onto concepts, differ from one language to another. This study investigated the influence of first language (L1) lexicalization patterns on the processing of second language (L2) words in sentential contexts by both less proficient and more proficient Persian learners of English. The focus was on cases where two different senses of a polys...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004